A role-free approach to indexing large RDF data sets in secondary memory for efficient SPARQL evaluation

نویسندگان

  • George H. L. Fletcher
  • Peter W. Beck
چکیده

Massive RDF data sets are becoming commonplace. RDF data is typically generated in social semantic domains (such as personal information management [2, 11, 13]) wherein a fixed schema is often not available a priori. We propose a simple Three-way Triple Tree (TripleT) secondary-memory indexing technique to facilitate efficient SPARQL query evaluation on such data sets. The novelty of TripleT is that (1) the index is built over the atoms occurring in the data set, rather than at a coarser granularity, such as whole triples occurring in the data set; and (2) the atoms are indexed regardless of the roles (i.e., subjects, predicates, or objects) they play in the triples of the data set. We show through extensive empirical evaluation that TripleT exhibits multiple orders of magnitude improvement over the state of the art on RDF indexing, in terms of both storage and query processing costs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Efficient SPARQL Query Processing on RDF Data

Efficient support for querying large-scale RDF triples plays an important role in Semantic Web data management. This paper proposes an efficient RDF query engine to evaluate SPARQL queries, where the inverted index structure is employed for indexing RDF triples. We first design and implement a set of operators on the inverted index for query optimization and evaluation. Then we propose a main-t...

متن کامل

RDFMatView: Indexing RDF Data using Materialized SPARQL queries

The Semantic Web aims to create a universal medium for the exchange of semantically tagged data. The idea of representing and querying this information by means of directed labelled graphs, i.e., RDF and SPARQL, has been widely accepted by the scientific community. However, even when most current implementations of RDF/SPARQL are based on ad-hoc storage systems, processing complex queries on la...

متن کامل

SpiderStore: Exploiting Main Memory for Efficient RDF Graph Representation and Fast Querying

The constant growth of available RDF data requires fast and efficient querying facilities of graph data. So far, such data sets have been stored by using mapping techniques from graph structures to relational models, secondary memory structures or even complex main memory based models. We present the main memory database SpiderStore which is capable of efficiently managing large RDF data sets a...

متن کامل

A Tool for Efficiently Processing SPARQL Queries on RDF Quads

We present a tool called RIQ (RDF Indexing on Quads) for efficiently processing SPARQL queries on large RDF datasets containing quads. RIQ’s novel design includes: (a) a vector representation of RDF graphs for efficient indexing, (b) a filtering index for efficiently organizing similar RDF graphs, and (c) a decrease-and-conquer strategy for efficient query processing using the filtering index t...

متن کامل

RIQ: Fast processing of SPARQL queries on RDF quadruples

In this paper, we propose a new approach for fast processing of SPARQL queries on large RDF datasets containing RDF quadruples (or quads). Our approach called RIQ employs a decrease-and-conquer strategy: Rather than indexing the entire RDF dataset, RIQ identifies groups of similar RDF graphs and indexes each group separately. During query processing, RIQ uses a novel filtering index to first id...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0811.1083  شماره 

صفحات  -

تاریخ انتشار 2008